Introduction
Monoclonal gammopathies, particularly multiple myeloma (MM), pose significant diagnostic challenges due to their complex cellular profiles. Flow cytometry, enhanced by EuroFlow standards, offers a detailed analysis but remains labor-intensive and prone to inter-observer variability. Machine learning can streamline this process, providing consistent and accurate diagnostic support. This study explores the application of machine learning to flow cytometry data for the diagnosis and minimal residual disease (MRD) detection in MM.
Objectives
The primary objective was to develop and validate a machine learning model for the accurate diagnosis and MRD detection in MM using flow cytometry data. Secondary objectives included balancing the dataset, improving plasma cell enrichment, and evaluating the model's performance in a clinical setting.
Methodology
We collected over 800 samples studied according to EuroFlow standards for MM, which included two different tubes: the first tube comprising CD138, CD38, CD45, CD19, CD56, CD27, CD117, CD81, CD20, CD200, CD28, and cytoplasmic immunoglobulin kappa/lambda, and the second tube comprising CD138, CD38, CD45, CD19, CD56, CD27, CD117, CD81, CD20, CD200, CD28, and cytoplasmic immunoglobulin kappa/lambda. Among these, 44% of the samples were for diagnosis purposes and 56% for MRD detection, with 90% of diagnostic samples confirmed as pathogenic by expert review.
Preprocessing included using the Bioconductor package flowAI to remove doublets, margins, and artifacts. A gating strategy was applied to enrich the analysis for plasma cells, extracting only positive events from each sample. To address dataset imbalances, we applied the Synthetic Minority Over-sampling Technique (SMOTE) in the training set.
We employed flowSOM for clustering, extracting clusters and metaclusters from each tube, which were then fed into a random forest classifier. The model was trained and cross-validated in the training set, followed by independent validation in the test set.
Results
The training set comprised 1,000 samples with an equal distribution of class labels (500 positive, 500 negative). The random forest model exhibited robust performance during the training phase, achieving an out-of-bag (OOB) area under the curve (AUC) of 99.3%, a precision-recall (PR) AUC of 99.3%, and a Brier score of 0.04, indicating high accuracy. The OOB G-mean was 0.95, with a misclassification rate of 4.8%. The confusion matrix indicated a class error of 1.2% for the negative class and 8.4% for the positive class.
The test set consisted of 138 samples. During the validation phase, the model maintained strong performance, with an AUC of 91.6%, a PR-AUC of 0.72, and a Brier score of 0.10. The G-mean for the test set was 0.91, with a misclassification rate of 10.9%. The confusion matrix for the test set showed a class error of 2.3% for the negative class and 14.7% for the positive class. Further analysis of the test set revealed that 56 samples were diagnostic, with a misclassification rate of 5%, while 82 samples were obtained for MRD detection, where the rate increased to 13.58%.
In addition to the robust performance metrics, the implementation of our machine learning model within the SmartCytoFlow platform has significantly streamlined the diagnostic workflow. SmartCytoFlow automates data preprocessing, gating, clustering, and classification, providing real-time diagnostic support. The integration has resulted in a substantial reduction in analysis time and improved diagnostic consistency.
Conclusion
Our study underscores the efficacy of integrating machine learning with flow cytometry for diagnosing and monitoring of monoclonal gammopathies, and particularly multiple myeloma. This approach offers a promising adjunct to traditional diagnostic methods, ensuring consistency and accuracy in clinical settings. Future validation in diverse cohorts is warranted to establish its broader applicability and utility in routine diagnostic practice.
Mateos:Pfizer: Honoraria, Membership on an entity's Board of Directors or advisory committees; Amgen: Honoraria, Membership on an entity's Board of Directors or advisory committees; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees; Johnson and Johnson: Honoraria, Membership on an entity's Board of Directors or advisory committees; Sanofi: Honoraria; GSK: Honoraria, Membership on an entity's Board of Directors or advisory committees; F. Hoffmann-La Roche Ltd: Honoraria, Membership on an entity's Board of Directors or advisory committees; Abbvie: Honoraria, Membership on an entity's Board of Directors or advisory committees; Regeneron: Honoraria; Stemline: Honoraria, Membership on an entity's Board of Directors or advisory committees; Kite: Honoraria, Membership on an entity's Board of Directors or advisory committees; Oncopeptides: Honoraria; Salamanca University: Current Employment; Celgene: Honoraria. Mosquera Orgueira:GSK: Consultancy; Novartis: Other; Incyte: Other; Takeda: Speakers Bureau; Roche: Consultancy; Pfizer: Consultancy; Abbvie: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; AstraZeneca: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Biodigital THX: Current equity holder in private company; Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal